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The social networks that infectious diseases spread along are typically clustered. Because of the 
close relation between percolation and epidemic spread, the behavior of percolation in such networks 
gives insight into infectious disease dynamics. A number of authors have studied clustered networks, 
but the networks often contain preferential mixing between high degree nodes. We introduce a class 
of random clustered networks and another class of random unclustered networks with the same 
preferential mixing. We analytically show that percolation in the clustered networks reduces the 
component sizes and increases the epidemic threshold compared to the unclustered networks. 



Classical random networks contain few short cycles, 
and the proportion of nodes in short cycles goes to zero 
as the number of nodes increases. In contrast social net- 
works typically contain many short cycles. We refer to 
such networks as clustered networks. The impact of clus- 
tering on percolation properties is usually difficult to cal- 
culate because cycles prevent the use of branching pro- 
cess arguments, but it is widely expected that clustering 
significantly alters percolation. 

Typically studies of infectious disease spread assume 
that outbreaks begin with a single infected node. The dis- 
ease travels to each susceptible neighbor independently 
with probability T, the transmissibility, and the node 
recovers. The process repeats. We focus on diseases for 
which recovery provides immunity, so recovered nodes are 
not susceptible. Typically the outbreak dies out stochas- 
tically or becomes an epidemic and spreads until the 
number of susceptible nodes is reduced. 

It is well-established that for fixed T, the epidemic 
spread can be mapped to a bond percolation problem 
wherein each edge is kept with probability T HJ [TUl 
1121 rl3| [16] , If we perform percolation on the network 
and then choose the initial infection, the disease spreads 
from that initial infection along edges of the percolated 
network, and so an epidemic occurs iff the initial node is 
in the giant component. The size of the epidemic matches 
the size of the giant component. This establishes that the 
probability and fraction infected in epidemics are equal 
if T is fixed and all edges are independent [22] . 

Because social networks frequently exhibit clustering, 
a number of studies have investigated the impact of clus- 
tering on epidemic problems [H HI El El H31 EH HOI E]- 
Some have found that clustering reduces the sizes of epi- 
demics and raises the epidemic threshold. That is, clus- 
tering reduces the size of giant components and raise the 
percolation threshold. However, others have shown that 
clustering appears to reduce the threshold. Consequently 
epidemics should be possible at lower transmissibility in 
the presence of clustering. 

This discrepancy occurs because there are many ways 
used to generate clustered networks, and each nework 
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class results in different behaviors. It is difficult to sep- 
arate the impact of clustering from other features intro- 
duced by the network generation process. 

In this article we introduce a new algorithm to gener- 
ate random clustered networks [53] • The clustered net- 
works have correlations between degrees in a well-defined 
manner which can lead to assortativity, the tendancy for 
nodes to contact nodes of similar degree. We show how 
to generate unclustered networks with the same correla- 
tions. We can make analytic comparisons between the 
two, and so clearly separate the effect of clustering from 
degree correlations. We show that although the clus- 
tered networks can have a reduced threshold compared 
to purely random networks of the same degree distri- 
bution, that is entirely an artifact of the assortativityx. 
Compared to an unclustered network of the same degree 
correlations, the clustered networks result in smaller epi- 
demics and higher epidemic threshold. 

This article is organized as follows: we first intro- 
duce our clustered and unclustered networks. We then 
calculate and compare the epidemiological quantity TZq 
which measures how many new infections a typical in- 
fected node causes. Finally, we calculate the final 
size/probability of epidemics assuming constant T. 



I. THE NETWORKS 

We model our approach after standard algorithms for 
Configuration Model (CM) networks [H[T3J[TH]. CM net- 
works are useful because all edges from a node are inde- 
pendent of one another, in the sense that whether an 
epidemic results from following one edge is independent 
of the result along any other edge because short cycles 
are negligible. 



A. Clustered Networks 

We begin with N nodes. To each node u we assign two 
degrees, an independent edge degree ki and a triangle 
degree k&. The joint probability of kj and k& is given 
by p(ki,k&). Then u will be part of k& triangles and 
have ki other edges. Each triangle and edge from u will 
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be independent of other triangles and edges in the same 
way that edges in CM networks are independent. 

We create an independent stub list and a triangle stub 
list. We place u into the independent stub list ki times 
and into the triangle stub list k& times. Once all nodes 
are placed into the lists, we randomize them. We then 
take the pairs of nodes in positions 2n and 2n + 1 of 
the independent list and join them, and the triples in 
positions 3n, 3n + 1, and 'in + 2 and join them into a 
triangle. Some repeated edges or loops or short cycles 
other than the triangles we impose may appear, but their 
impact is negligible as N — > oo [24 . 

This algorithm inevitably segregates those nodes with 
a high proportion of triangles from those nodes with a 
low proportion of triangles. If the degrees of nodes with 
many triangles differs from the degrees of nodes with few 
triangles, then this effect will cause correlation of differ- 
ent degrees. In order to understand the impact of clus- 
tering, we must be able to compare percolation in these 
clustered networks with percolation in networks whose 
nodes are segregated in the same way. 



B. Unclustered, Segregated Networks 

For comparative purposes we develop a correspond- 
ing unclustered network with the same segregation as 
the clustered networks. Given the joint distribution 
pC&Tj&a) °f independent and triangle degrees, we cre- 
ate a new network where nodes are assigned blue and 
red degrees such that fcf, = ki and k r = 2k a ■ The joint 
distribution is given by k r ) — p(kb, k r /2). 

We proceed as before. We create a blue and red list, 
and pair nodes in positions 2n and 2n + 1 in the blue 
list and then repeat with the red list, joining pairs, not 
triples. The resulting network has the same segregation 
as the corresponding clustered network, but short cycles 
are negligible. 



II. TZ 

TZq is usually defined as the number of new infections 
caused by an average infected individual. Occasionally 
alternate definitions are used, but in some way it repre- 
sents the number of new infections attributed to an av- 
erage infected individual. TZq = 1 is the threshold below 
which epidemics are impossible (i.e., the percolated net- 
work has no giant component). If TZq > 1 then epidemics 
are possible, but not guaranteed. 



A. Clustered Networks 

To simplify the analysis, first assume that u, v, and w 
are members of a triangle and u becomes infectious first. 
There are multiple ways that both v and w can become 
infected from edges within the triangle, but they all have 



the same impact on the epidemic. It is convenient to 
treat infections of v and w as if they came from from u 
regardless of the actual path followed. 

Thus if u becomes infected, then with probability 
2T 2 ( 1 - T) + T 2 = 3T 2 - 2T 3 it is credited with infecting 
both v and w, and with probability T{\ — T) 2 it is cred- 
ited with infecting just 1. With probability (1 — T) 2 it 
infects neither. In spirit this approach is similar to that 
of For book-keeping purposes, we define the rank s 
of a node as follows: the index case is given rank 0. Each 
node v is then assigned rank s to be the shortest path 
from the index case to v, bearing in mind the rule above 
for crediting infections. 

This allows us to define a 2 x 2 next-generation ma- 
trix [Hj. We separate those nodes infected along an inde- 
pendent edge from those nodes infected along a triangle 
edge [25] ■ We define cji and cai to be the number of in- 
fections that a node infected from an independent edge is 
expected to cause along independent and triangle edges 
respectively. We symmetrically define cja and caa- If 
rx/(s) and riA (s) are the number of nodes of rank s which 
were infected along independent and triangle edges re- 
spectively, then 
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The dominant eigenvalue of this matrix is TZq. We 
generally want to determine T such that TZq < 1. Sub- 
stituting TZq = 1 into the characteristic equation gives 
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The value T — T c that solves this equation is the thresh- 
old transmissibility below which epidemics are impossi- 
ble. 

The original network has a giant component if TZq > 1 
when T = 1 . Thus the conditions for a giant component 
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The only networks for which the first condition applies 
but not the second are networks with enough indepen- 
dent edges and triangle edges such that a giant com- 
ponent exists soley within the independent edges and a 
giant component exists soley within the triangle edges. 
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B. Unclustered, Segregated Network 

We define rib(s) and n r (s) in the same manner, except 
that triangles need not be considered. Then 



n 6 (s + l)\ _ fc bb c br \ (n h 

n r (s+l)J \c r b C rr J \n r 
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and c rr = ^ K ^ — - . Substituting 7?-o = 1 into the char- 
acteristic equation finds the epidemic threshold 
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The network has a giant component when 

{Kl - K b ) (K? -K r ) 
2 (K b ) + 2(K r ) 

and/ or 
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The difference between these conditions and those of the 
corresponding clustered network comes from the fact that 

2 (K\ - K A ) I (K A ) < {Kl - K r ) I (K r ). From this 

it can be shown that the epidemic threshold occurs at 
smaller T for the unclustered network. 



III. CALCULATING GIANT COMPONENT 
SIZE 

To calculate the fraction of nodes in the giant compo- 
nent, it suffices to calculate the probability that a random 
node is not part of the giant component. These calcula- 
tions have been done for CM networks by [TTJ [T21 US] ■ 



A. Clustered Network 

We follow the approach of [T31 [TJ . A related approach 
is given by [TH]. 

We let / be the probability a random node u is not 
part of the giant component. We have 

fcj *IA 



/ = P( k i' k &)9 k i' 9a 



kj .kr 



where gi and g A are the probabilities that an indepen- 
dent edge or a triangle respectively does not connect to 
the giant component. To find gi, we note that there 
are two ways an edge can fail to connect u to the giant 
component: It may be deleted in the percolation process 



with probability 1 — T, or it may be kept, but v, the node 
reached, is not part of the giant component. We have 

gi = l - T + Thj 

where hj is the probability that a node v reached along 
an independent edge is not part of the giant component. 
To calculate hi we note that v is selected proportional 
to ki, but only has fc/ — 1 susceptible neighbors along 
independent edges. We get 

hi = kip{ki,k A )g k I '~ 1 g k A . 



(Ki 



ki ,ki 



For g A we get 

<?A = [1 -T + Th A ] 2 - 2T 2 (1 - T)h A (l - h A ) , 

where h A is the probability a node reached along a trian- 
gle edge does not connect to the giant component through 
any edge not in the triangle. We find 

hA = T^~x Y k ^p( k i' k ^)9i I g k A ~ 1 ■ 

The resulting system of equations for gi, g Al hi, and 
h A can be solved iteratively, and the result gives /. 



B. Unclustered, Segregated Network 

To find /„, the probability a random node in the un- 
clustered network is not part of the giant component, we 
proceed similarly. We find 

fu = Y Pu( k b,k r )g b b gr r 

g b = l-T+ Th b 
g r = 1 — T + Th r 
1 
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It can be shown that f u < f for the equivalent degree 
distributions. Consequently the size of the giant compo- 
nent is smaller in clustered networks than in unclustered 
networks of the same degree distribution and degree cor- 
relations. 



IV. RESULTS 

In figure [l] we consider outbreak spread on three net- 
works, all of which have the same degree distribution. 
We compare simulated epidemic sizes with predictions 
from the clustered equations, the unclustered, segregated 
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FIG. 1: A comparison of different network configurations. As- 
sortative mixing reduces the epidemic threshold. Clustering 
reduces epidemic size. 

equations, and the equations derived previously for con- 
figuration model networks [TTJ [T31 [TO] . 

The nodes are equally distributed between degrees 2, 
4, and 6. In each network the clustering is distributed 
differently. In the first, p(0, 3) = 1/3, p(2, 1) = 1/3, and 
p(2, 0) = 1/3. That is those nodes with degree 6 are only 
in triangles, nodes of degree 4 have half of their edges in 
triangles and independent edges, and nodes of degree 2 
have just independent edges. High degree nodes tend to 
be clustered and contact other high degree nodes. The 
tendancy to contact other high degree nodes reduces the 
epidemic threshold, but the clustering raises the thresh- 
old. 



In the second network, we take p(2, 0) = 1/6, p(0, 1) = 
1/6, p(2,l) = 1/3, p(4,l) = 1/6, andp(0,3) - 1/6. 
This yields identical distribution of neighbor degrees for 
nodes reached by either a triangle or an independent 
edge. The unclustered, segregated equations yield the 
same result as the configuration model equations. The 
clustered calculations have smaller epidemics. 

The third network is an inversion of the first. Nodes 
with high degree have independent edges while nodes 
with low degree are clustered. We take p(6,0) = 1/3, 
p(2, 1) = 1/3, and p(0, 1) — 1/3. Again the assortativity 
reduces the epidemic threshold while clustering reduces 
the epidemic size. In this particular case, it is the pref- 
erence for high degree nodes (which are unclustered) to 
contact one another that leads to the reduction in epi- 
demic threshold, and so it is clear that the effect is due 
to assortative mixing, not clustering. 



V. DISCUSSION 

We have introduced a new model of clustered net- 
works on which we study percolation and epidemics. This 
model allows us to make a number of analytic prediction 
because the edges of the network can be partitioned into 
sets which are independent of one another (independent 
edges or triangles). 

We have shown that these networks can have a lower 
epidemic threshold than Configuration Model networks 
with the same degree distribution. However, this is not 
a consequence of clustering, but rather a consequence of 
assortative mixing. The clustering of the network can 
be proven to raise the epidemic threshold and reduce the 
epidemic size from networks with the same degree corre- 
lations, but without clustering. 
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